Shanghai Jiao Tong University participation in high - level feature extraction and surveillance event detection at TRECVID 2009

نویسندگان

  • Xiaokang Yang
  • Yi Xu
  • Rui Zhang
  • Erkang Chen
  • Qing Yan
  • Bo Xiao
  • Zhou Yu
  • Ning Li
  • Zuo Huang
  • Cong Zhang
  • Xiaolin Chen
  • Anwen Liu
  • Zhenfei Chu
  • Kai Guo
  • Jun Huang
چکیده

In this paper, we describe our participation for high-level feature extraction, automatic search and surveillance event detection at TRECVID 2009 evaluation. In high-level feature extraction, we establish a common feature set for all the predefined concepts, including global features and local features extracted from the keyframes. For the concepts related to person activity, space--time interest points are also used. Detection of ROI and Faces is needed for some special concepts, such as playing instrument, female face close-up. Classifiers are trained using these features and linear weighted fusion of the classification results are utilized as the baseline. Specifically, simple average fusion can work pretty well. Further, ASR and IB re-ranking are used to improve the overall performance. We submitted the following six runs: A_SJTU_ICIP_Lab317_1: Average fusion of classification results with global features and local features used, SVM classifiers are trained on TRECVID2009 development data A_SJTU_ICIP_Lab317_2: Linear weighted fusion of classification results with global and local features used, SVM classifiers are trained on TRECVID2009 development data A_SJTU_ICIP_Lab317_3: Max of RUN1 and RUN2, and re-rank on ASR A_SJTU_ICIP_Lab317_4: Max of RUN1 and RUN2, and re-rank on IB re-ranking A_SJTU_ICIP_Lab317_5: Based on the result of RUN3, combine ASR and IB re-ranking A_SJTU_ICIP_Lab317_6: Max of all runs In Event detection, trajectory features obtained from human tracking and optical flow computation, local appearance and shape features are employed in event model training. With regard to particular event detection tasks, several detection rules are tested using HMM models, boosted classifiers, matching and heuristic settings. We provide the detection results of eight event tasks out of 10 required events for performance evaluation. SJTU_2009_retroED_EVAL09_ENG_s-camera_p-baseline_1: Event detection based on human tracking, motion detection and gesture recognition 1 High-level Feature Extraction 1.1 Overview In TRECVID2009, we explore several novel technologies to help detect high-level concepts. We divide all the 20 concepts into 3 parts, as concepts on object and scene, person action, and face detection. We extract different features to adapt to different concept detection tasks. There are four main steps in our framework, as shown in Fig. 1: Figure 1 High-level feature extraction framework Low level feature extraction: We extract several low level features, including global features, local features and other particular features. As for global features, there are two kinds of color features(CM: Color Moment, CAC: Color Auto-Correlograms), two kinds of complex features(EOAC: Edge Orientation Auto-Correlograms, ERCAC: Edge Region Color Auto-correlogram) and LBP(Local Binary Patterns) features. The local features mainly used is SIFT features, which are described as a bag-of-visual-words (BoWs). In the context of concept detection about person activity, we use Space-Time Interest Points(STIP). Model: We adopted Support Vector Machines [1] as our classification method, training the individual SVM classifier for each low-level feature based on valid cross database learning on TRECVID2009 development data. Ranking: Simple average fusion and linear weighted fusion are used to combine multiple ranking results obtained using all the trained models. Re-ranking: We extracted textual information based on automatic speech recognition (ASR) and information bottle (IB) principle. By adding the positive textual relevant factor to the previous ranking result, we obtained the re-ranking results. 1.2 Low level feature extraction 1.2.1 Global feature We establish five baseline low-level features, out of which 4 types of features had been used in our Trecvid2007 system, including two kinds of color features (CAC, 166 dim; CM, 225 dim, 5*5grids) , one texture features( Local Binary Pattern(LBP), 531 dim, 3*3 grids)), one shape Edge Orientation Auto-Correlograms(EOAC, 144dim). We also propose a novel type of regional feature, which is called edge region color Auto-correlogram (ERCAC, 166 dim). It aims to characterize image using the color and shape features jointly, capturing both color distribution of image and spatial correlation of edge points. 1.2.2 Local feature Besides global features, we also extract local features(i.e. SIFT) from keyframes of the detected shot. We develop SIFT features from integrated Difference of Gaussian (DOG) interest point[10]. Thus, the keyframe can be described as a bag-of-visual-words (BoWs), where k-means is adopted to cluster the local features and each cluster is represented as a visual word. Accordingly, each keyframe is described by a visual dictionary or a vocabulary. SVM can then be used to classify the concept of each shot based on the histogram of the vocabulary,. The most important issue is how to determine the size of the visual vocabulary, which would greatly influence the performance of BoWs. A smaller vocabulary might not contain the whole content of the keyframe, while larger vocabulary should be a waste of computer performance, and much redundant information is not preferred. We have conducted a large number of experiments over TRECVID2009 development dataset by choosing vocabularies of different sizes. All the detections resulted from different vocabularies are fused to get a stable result. For some special concepts like traffic-intersection, we use pyramid histogram of word (PHOW) to improve the overall detection performance. PHOW divides the region of interest in keyframe into four parts, and combines the four histograms with the original histogram. Thus, the resulting histogram would contain more spatial information. Good performance is expected on the scene concept. 1.2.3 Space-Time Interest Points For six concepts of human activity, STIP computes locations and descriptors for space-time interest points in video. The detector is the extension of Harris operator in space-time domain. The descriptors HOG (Histograms of Oriented Gradients) are computed for the volumetric video slices around the detected space-time interest points. In the experiments, we directly process the whole video sequences instead of keyframes using the STIP. 1.2.4 Special feature for some tasks ROI feature For the concepts about human activity, extraction of Region Of Interest(ROI) is needed. In order to locate people’s body parts, an edge-based deformable model is matched to the keyframe. We use the conditional random field(CRF) to obtain such deformable models. Finally, Pyramid Histogram of Oriented Gradients (PHOG) is extracted over ROI as features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shanghai Jiao Tong University participation in high-level feature extraction, automatic search and surveillance event detectionat TRECVID 2008

In this paper, we describe our participation for high-level feature extraction, automatic search and surveillance event detection at TRECVID 2008 evaluation. In high-level feature extraction, we use selective attention model to extract visual salient feature which highlights the most visual attractive information of an image. Besides this, we extract 7 low-level features for various modalities ...

متن کامل

Grading evaluation study of atlas based auto-segmentation of organs at risk in thorax

Background: The grading evaluation of atlas based auto-segmentation (ABAS) of organs at risk (OARs) in thorax was studied. Materials and Methods: Forty patients with thoracic cancer were included in this study, and for each thirteen thoracic OARs were delineated by an experienced radiation oncologist. The patients were randomly grouped into the training and the test dataset (20 each). The inves...

متن کامل

Correlational studies on insulin resistance and leptin gene polymorphisms in peritoneal dialysis patients

Objective(s):The aim of the study was to investigate the relationship between insulin resistance (IR) and leptin (LEP) gene polymorphisms in peritoneal dialysis (PD) patients. Materials and Methods: From July 1, 2011 to August 1, 2011, patients who received chronic PD were chosen and divided into three groups (DM, high HOMR-IR, and low HOMR-IR). Two PCR products of LEP were sequenced and aligne...

متن کامل

Potent Anti-Inflammatory Activity of Tetramethylpyrazine Is Mediated through Suppression of NF-k

The purpose of the current study was to evaluate the anti-inflammatory activity of tetramethlpyrazine on oxazolone-induced colitis mice. Spleen mononuclear cells (SMC), lamina propria mononuclear cells (LPMC) and peripheral blood mononuclear cells (PBMC) were isolated from oxazolone-induced colitis and normal mice. The colitis cells treated by oxazolone were randomly divided into model, low dos...

متن کامل

Potent Anti-Inflammatory Activity of Tetramethylpyrazine Is Mediated through Suppression of NF-k

The purpose of the current study was to evaluate the anti-inflammatory activity of tetramethlpyrazine on oxazolone-induced colitis mice. Spleen mononuclear cells (SMC), lamina propria mononuclear cells (LPMC) and peripheral blood mononuclear cells (PBMC) were isolated from oxazolone-induced colitis and normal mice. The colitis cells treated by oxazolone were randomly divided into model, low dos...

متن کامل

NHK STRL at TRECVID 2009: Surveillance Event Detection and High-Level Feature Extraction

NHK Science and Technology Research Laboratories participated in two tasks at TRECVID 2009: surveillance event detection task and high level feature extraction task. For surveillance event detection tasks, we targeted four events: " P e r s o n R u n s " , " P e o p l e M e e t " , " O b j e c t P u t " , a n d "OpposingFlow". The proposed method detects human regions using HOG descriptor and S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009